1. Field of the Invention
The present invention generally relates to a speech output system and a method thereof, in particular, to a speech synthesizer generating system and a method thereof.
2. Description of Related Art
The demands to automatic services and devices have been increasing along with the advancement of technologies, wherein speech output is one of the commonly demanded services. With speech guidance, less manpower is consumed and automatic services can be provided. High quality speech output is a common user interface required by various services. In particular, speech is the most natural, convenient, and secure information output in a mobile device having limited display screen. In addition, audio books provide a very efficient learning method, especially for learning a foreign language.
However, existing speech output methods can be categorized into two modes which respectively have their own disadvantages. Voice recording is one of the two modes, and which is time-consuming and has high cost and unchangeable speech output. Speech synthesis is the other speech output mode which provides low-quality and inflexible speech quality and is difficult to customize a speech.
Referring to FIG. 1, a system and method for text-to-speech processing in a portable device are provided by AT&T in U.S. Pat. No. 7,013,282. According to this method, a user 130 inputs some text into a desktop computer 110. Then the input text is converted by a text-to-speech (TTS) module 112 in the desktop computer 110. To be specific, the text is converted into a speech output 118 by a text analysis module 114 and a speech synthesis module 116. In this invention, the TTS conversion operation is performed by the desktop computer 110 which has high calculation capability, and the synthesized speech output 118 is transmitted from the desktop computer 110 to a handheld electronic device 120 having lower calculation capability. The speech output 118 output by the TTS module 112 includes a carrier phrase and a slot information and is transmitted to a memory of the handheld electronic device 120. The handheld electronic device 120 then concatenates and outputs these carrier phrases and slot information.
However, in foregoing disclosure, the content to be converted by the TTS module is unchangeable, which is very inflexible. In addition, the speech synthesis module in the desktop computer 110 for synthesizing the speech is also unchangeable. Moreover, the desktop computer 110 and the handheld electronic device 120 have to operate synchronously.
A speech synthesis apparatus and selection method are provided by HP in U.S. Pat. No. 6,725,199 and U.S. Pat. No. 7,062,439. A method for assessing speech quality is provided in these disclosures, wherein an “objective speech quality assessor” is used for generating a confidence score for a speech-form utterance, and the speech-form utterance having the best confidence score is selected among a plurality of TTS modules to improve the quality of the speech output. If there is only one TTS module, the text is rewritten into other texts having the same meaning and then the speech-form utterance of these rewritten texts having the best confidence score is selected as the speech output.